Simplify query for runs to export for samples and data during folder export#7448
Simplify query for runs to export for samples and data during folder export#7448
Conversation
There was a problem hiding this comment.
Pull request overview
This pull request refactors the folder export logic for sample types and data classes to reduce memory usage by replacing in-memory filtering with direct SQL queries. Instead of loading all ExpData, ExpMaterial, and ExpRun objects into memory and filtering them in Java, the new implementation queries for run IDs directly from the database with appropriate filters.
Changes:
- Replaced memory-intensive object loading and Java-based filtering with SQL-based queries for determining which runs to export
- Added two new methods to ExperimentService API:
getDerivationRunIdsForDataClassExportandgetDerivationRunIdsForSampleTypesExport - Removed intermediate data structures and helper methods (
isValidRunType) that were used for in-memory filtering
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| experiment/src/org/labkey/experiment/samples/SampleTypeFolderWriter.java | Simplified run selection by replacing material loading and filtering with direct SQL query; removed isValidRunType method |
| experiment/src/org/labkey/experiment/samples/DataClassFolderWriter.java | Simplified run selection by replacing data loading and filtering with direct SQL query; removed isValidRunType method |
| experiment/src/org/labkey/experiment/api/ExperimentServiceImpl.java | Implemented two new SQL-based query methods that return filtered run IDs for folder export |
| api/src/org/labkey/api/exp/api/ExperimentService.java | Added interface declarations for the two new query methods |
Comments suppressed due to low confidence (3)
experiment/src/org/labkey/experiment/samples/SampleTypeFolderWriter.java:106
- The comment states "only want the sample derivation runs" but the implementation also includes aliquot runs (SAMPLE_ALIQUOT_PROTOCOL_LSID). Consider updating the comment to clarify that both derivation and aliquot protocol runs are included.
// only want the sample derivation runs; other runs will get included in the experiment xar.
api/src/org/labkey/api/exp/api/ExperimentService.java:811
- The JavaDoc comment could be more detailed. Consider expanding it to explain: (1) what "derivation run IDs" means in this context, (2) that it includes runs where materials from the specified sample types are used as either inputs or outputs, (3) the behavior of the includeRunsWithDataIO parameter more explicitly (when true, includes all derivation and aliquot runs; when false, includes aliquot runs and only derivation runs without data inputs/outputs).
/** Get derivation/aliquot run IDs for sample types — filtered by protocol and optionally excluding runs with data inputs/outputs */
List<Long> getDerivationRunIdsForSampleTypesExport(Collection<String> sampleTypeLsids, Container c, boolean includeRunsWithDataIO);
api/src/org/labkey/api/exp/api/ExperimentService.java:808
- The JavaDoc comment could be more detailed. Consider expanding it to explain: (1) that it returns run IDs where the specified data class is used as either input or output, (2) that it only includes runs with the SAMPLE_DERIVATION_PROTOCOL that have no material inputs or outputs (since those are handled by the sample type writer).
/** Get derivation run IDs for a data class — runs with SAMPLE_DERIVATION_PROTOCOL that have no material inputs/outputs */
List<Long> getDerivationRunIdsForDataClassExport(long dataClassRowId);
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Rationale
DataClassFolderWriter.write() and SampleTypeFolderWriter.write() follow the problematic pattern that has the potential of large memory usage.
This PR simplifies the process by getting rid of the intermediate steps to query for all materials and runs, and instead query for run.rowId directly based on specified criteria.
Related Pull Requests
Changes